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Abstract 


This  work  developed  new  biological  models  of  cortex  and  produced  artificial  neural  systems  for  computa¬ 
tion  which  exploited  the  special  capabilities  of  complex  dynamics,  and  applied  them  to  specific  engineering 
problems:  handwritten  character  and  word  recognition,  and  grammatical  inference. 

In  the  later  stages  of  the  project  we  succeeded  in  demonstrating  analytically  and  numerically  how  an 
unusual  cortical  “sensory-motor”  computing  architecture,  can  be  constructed  of  recurrently  interconnected 
associative  memory  modules.  Because  intercommunicating  modules  of  the  architecture  are  analytically 
guaranteed  to  store  and  recall  multiple  oscillatory  and  chaotic  attractors,  the  architecture  has  served  as  a 
framework  in  which  to  arrange  and  exploit  the  special  capabilities  dynamic  attractors.  Modules  with  oscil¬ 
latory  and  chaotic  attractors  were  successfully  applied  to  the  problem  of  handwritten  character  recognition 
in  early  stages  of  the  work. 

The  modules  in  the  larger  architecture  can  learn  connection  weights  between  themselves  which  cause 
the  system  to  evolve  under  a  clocked  “machine  cycle”  by  a  sequence  of  transitions  of  attractors  within  the 
modules,  much  as  a  digital  computer  evolves  by  transitions  of  its  binary  flip-flop  states.  The  architecture  thus 
employs  the  principle  of  “computing  with  attractors”  used  by  macroscopic  systems  for  reliable  computation 
in  the  presence  of  noise. 

Superior  noise  immunity  was  demonstrated  for  these  systems  with  dynamic  attractors  over  systems  with 
static  attractors,  and  synchronization  between  coupled  periodic  or  chaotic  attractors  in  different  modules 
was  shown  to  be  important  for  effecting  reliable  transitions.  We  synchronized  chaotic  attractors  for  operation 
in  the  architecture  using  techniques  of  coupling  developed  for  secure  “broadspectrum”  communication  by  a 
modulated  chaotic  carrier  wave. 

We  constructed  a  system  which  learns  to  function  as  a  finite  state  automaton  that  perfectly  recognizes 
or  generates  the  infinite  set  of  six  symbol  strings  that  are  defined  by  a  Reber  grammar.  Even  though 
it  is  constructed  from  a  system  of  continuous  nonlinear  ordinary  differential  equations,  the  system  can 
operate  as  a  discrete-time  symbol  processing  architecture,  but  with  analog  input  and  oscillatory  subsymbolic 
representations. 

Most  recently  we  showed  how  the  architecture  can  learn  to  employ  selective  “attentional”  control  of 
synchronization  to  direct  the  flow  of  communication  and  computation  within  the  architecture  to  solve  a 
more  difficult  grammatical  inference  problem. 

This  type  of  computing  architecture  and  its  learning  algorithms  for  computation  with  oscillatory  spatial 
modes  is  ideal  for  implementation  in  optical  systems,  where  electromagnetic  oscillations,  very  high  dimen¬ 
sional  modes,  and  high  processing  speeds  are  available. 
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1  Summary  of  Work  Done 

Over  the  period  of  this  grant,  AFOSR-9 1-0325,  entitled  “Dynamical  Systems,  Neural  Networks,  and  Cortical 
Models”,  we  published  10  papers  and  one  book  chapter  [5,  7,  8,  10,  9,  12,  11,  14,  13,  29,  15].  Fifteen 
conferences  were  attended  where  oral  or  poster  presentations  were  made  -  including  nine  invited  talks  around 
the  world  in  France,  Japan,  and  Austrailia,  as  detailed  in  a  later  section. 

The  mathematical  foundation  for  the  work  is  a  learning  algorithm  for  recurrent  analog  neural  networks, 
the  normal  form  projection  algorithm,  developed  at  Berkeley  on  this  grant.  It  allows  analytically 
guaranteed  associative  memory  storage  of  analog  patterns,  continuous  sequences,  and  chaotic  attractors  in 
the  same  network.  We  know  of  no  other  system  with  such  a  guarantee.  There  are  N  units  of  capacity  in  an 
N  unit  network.  It  costs  one  unit  per  static  attractor,  two  per  Fourier  component  of  each  periodic  sequence 
(oscillating  attractor),  and  three  (or  more)  per  chaotic  attractor.  There  are  no  spurious  attractors.  For 
periodic  sequences  there  is  a  Liapunov  function  which  governs  the  approach  of  transient  states  to  stored 
trajectories  [12]. 

We  showed  that  the  general  projection  learning  rule  reduced  to  a  Hebbian  outer  product  rule  for  storage 
of  orthogonal  patterns  in  a  biological  model  of  olfactory  cortex.  We  showed  further  that  a  model  of  coupled 
oscillatory  neural  populations  which  assumed  only  minimal  coupling  justified  by  known  anatomy  resulted  if 
the  oscillating  patterns  were  further  restricted  to  have  the  phase  structure  of  those  observed  experimentally 
in  olfactory  cortex  [12]. 

An  alternative  network  for  for  implementation  of  the  projection  algorithm  called  the  “projection  network” 
[9]  was  then  developed.  This  network  has  3A^^  weights  instead  of  + and  is  more  useful  for  engineering 
applications  and  for  simulations  of  the  biological  model. 

These  networks,  including  some  with  chaotic  attractors,  were  successfully  applied  to  the  problem  of  real 
time  handwritten  character  recognition.  This  is  still  the  first  system  we  know  of  that  can  accomplish  reliable 
pattern  recognition  with  exclusively  chaotic  dynamics  [7]. 

The  biological  foundations  of  the  network  were  deepened  when  we  showed  how  the  higher  order  weights  of 
the  biological  model  could  be  realized  by  synaptic  clusters  on  dendrites  in  the  neuropil  [5].  We  showed  as  well 
how  the  Hebbian  multiple  outer  product  rule  could  be  decomposed  to  reveal  a  term  called  “competition”  that 
could  control  attractor  transitions.  It  could  be  realized  biologically  by  a  nucleus  that  summed  all  network 
activations  and  fedback  a  global  inhibition  to  all  nodes  of  the  network  [5] . 

In  the  next  project,  we  showed  analytically  and  numerically  how  a  cortical  “sensory-motor”  computing 
architecture,  could  be  constructed  of  recurrently  interconnected  associative  memory  modules  [10].  The 
architecture  is  such  that  the  larger  system  is  itself  a  special  case  of  the  type  of  network  of  the  modules,  and 
can  be  analysed  with  the  same  tools  used  to  design  the  subnetwork  modules.  Because  intercommunicating 
modules  of  the  architecture  can  store  and  recall  multiple  oscillatory  and  chaotic  attractors,  the  architecture 
serves  as  a  framework  in  which  to  arrange  and  exploit  the  special  capabilities  dynamic  attractors  [?]. 

The  modules  in  the  larger  architecture  learn  connection  weights  which  cause  the  system  to  evolve  under  a 
clocked  “sensory-motor  cycle”  by  a  sequence  of  transitions  of  attractors  within  the  modules,  much  as  a  digital 
computer  evolves  by  transitions  of  its  binary  flip-flop  attractors.  This  architecture  employs  the  principle 
of  “computing  with  attractors”  used  by  macroscopic  systems  for  reliable  computation  in  the  presence  of 
noise.  The  competition  parameter  mentioned  above  is  used  as  a  bifurcation  parameter  to  clock  the  attractor 
transitions  of  the  sensory-motor  cycle  [10]. 

We  then  constructed  a  discrete-time  “simple  recurrent”  or  “Elman”  network. architecture  with  oscillatory 
modules.  The  time  steps  (machine  cycles)  of  the  system  hold  input  and  “context”  modules  clamped  at  their 
oscillatory  attractors  while  “hidden”  modules  change  state,  then  clamp  hidden  states  while  context  modules 
are  released  to  load  those  states  as  the  new  context  for  the  next  cycle  of  input  [14]. 

The  system  learned  to  function  as  a  finite  state  automaton  that  perfectly  recognizes,  or  alternatively 
generates,  the  infinite  set  of  six  symbol  strings  defined  by  a  Reber  grammar.  Even  though  it  is  constructed 
from  a  system  of  continuous  nonlinear  ordinary  differential  equations,  the  system  operates  as  a  discrete-time 
symbol  processing  architecture,  but  with  analog  input  and  oscillatory  subsymbolic  representations. 

Superior  noise  immunity  was  demonstrated  for  modules  with  dynamic  attractors  over  modules  with  static 
attractors,  and  synchronization  between  coupled  periodic  or  chaotic  attractors  in  different  modules  has  been 
shown  to  be  important  for  effecting  reliable  transitions.  We  synchronized  Lorenz  attractors  for  operation 
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in  the  architecture  using  “control  of  chaos”  techniques  of  coupling  developed  for  secure  “broadspectrum” 
communication  by  a  modulated  chaotic  carrier  wave  [13]. 

We  demonstrated  in  recent  analysis  and  simulations  that  sets  of  canonical  equations  for  the  Chua  circuit 
can  be  used  in  the  projection  network  [13].  We  have  found  attractors  in  the  Chua  family  which  out-perform 
the  Lorenz  attractor  in  our  handwritten  character  recognition  systems,  with  fast  competitive  suppression  at 
low  values  of  competitive  coupling. 

Most  recently  we  have  shown  that  the  Elman  architecture  can  learn  to  employ  selective  “attentional” 
control  of  synchronization  to  direct  the  flow  of  communication  and  computation  within  the  architecture  to 
solve  a  grammatical  inference  problem  [15]. 

In  this  architecture,  oscillation  amplitude  codes  the  information  content  or  activity  of  a  module  (unit), 
whereas  phase  and  frequency  are  used  to  “softwire”  the  network.  We  have  shown  that  only  synchronized  mod¬ 
ules  communicate  by  exchanging  amplitude  information;  the  activity  of  non-resonating  modules  is  shown  to 
contribute  noise.  The  same  hardware  and  connection  matrix  can  thus  subserve  many  different  computations 
and  patterns  of  interaction  between  modules. 

Synchronization  control  is  modeled  as  a  subset  of  the  hidden  modules  with  ouputs  which  affect  the 
resonant  frequencies  of  other  hidden  modules.  -They  learn  to  perturb  these  frequencies  to  control  synchrony 
among  these  modules  and  direct  the  flow  of  computation  to  effect  transitions  between  subsections  of  a  large 
automaton  which  the  system  learns  to  emulate.  The  internal  crosstalk  noise  is  used  to  drive  the  required 
random  transitions  of  the  automaton. 


2  Introduction 

The  goal  of  this  work  is  to  produce  new  systems  for  computation  that  exploit  the  special  capabilities  of  com¬ 
plex  dynamics  and  apply  them  to  specific  engineering  problems  like  handwritten  character  and  word  recog¬ 
nition,  speech  recognition,  grammatical  inference,  adaptive  system  identification  and  control,  autonomous 
robot  navigation,  and  artificial  intelligence. 

Recordings  of  local  field  potentials  have  revealed  40  t«  80  Hz  oscillation  in  vertebrate  cortex  [24,  25]. 
The  amplitude  patterns  of  such  oscillations  have  been  shown  to  predict  the  olfactory  and  visual  pattern 
recognition  responses  of  a  trained  animal.  There  is  further  evidence  that  although  the  oscillatory  activity 
appears  to  be  roughly  periodic,  it  is  actually  chaotic  when  examined  in  detail.  This  preliminary  evidence 
suggests  that  oscillatory  or  chaotic  network  modules  may  form  the  cortical  substrate  for  many  of  the  sensory, 
motor,  and  cognitive  functions  now  studied  in  static  networks. 

It  remains  be  shown  how  networks  with  more  complex  dynamics  can  performs  these  operations  and 
what  possible  advantages  are  to  be  gained  by  such  complexity.  Our  work  has  therefore  culminated  in  the 
construction  of  a  parallel  distributed  processing  architecture  that  is  inspired  by  the  structure  and  dynamics 
of  cerebral  cortex,  and  applied  it  to  the  problem  of  grammatical  inference.  The  construction  views  cortex 
as  a  set  of  coupled  oscillatory  associative  memories,  and  is  guided  by  the  principle  that  attractors  must  be 
used  by  macroscopic  systems  for  reliable  computation  in  the  presence  of  noise. 

This  system  must  function  reliably  in  the  midst  of  noise  generated  by  crosstalk  from  it’s  own  activity. 
Present  day  digital  computers  are  built  of  flip-flops  which,  at  the  level  of  their  transistors,  are  continuous 
dissipative  dynamical  systems  with  different  attractors  underlying  the  symbols  we  call  “0”  and  “1”.  In  a 
similar  manner,  the  network  we  have  constructed  is  a  symbol  processing  system,  but  with  analog  input  and 
oscillatory  subsymbolic  representations. 

Periodic  or  nearly  periodic  (chaotic)  variation  of  a  signal  introduces  a  additional  degrees  of  freedom  that 
can  be  exploited  in  a  computational  architecture.  We  are  presently  investigating  the  design  principle  that 
selective  control  of  synchronization,  which  we  consider  to  be  a  model  of  “attention”,  can  be  used  to  control 
program  flow  in  an  architecture  with  dynamic  attractors. 

The  architecture  operates  as  a  thirteen  state  finite  automaton  that  generates  the  symbol  strings  of  a  Reber 
grammar-  It  is  designed  to  demonstrate  and  study  the  following  issues  and  principles  of  neural  computation: 
(1)  Sequential  computation  with  coupled  associative  memories.  (2)  Computation  with  attractors  for  reliable 
operation  in  the  presence  of  noise.  (3)  Discrete  time  and  state  symbol  processing  arising  from  continuum 
dynamics  by  bifurcations  of  attractors.  (4)  Attention  as  selective  synchronization  controling  communication 
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and  temporal  program  flow.  (5)  chaotic  dynamics  in  some  network  modules  driving  randomn  choice  of 
attractors  in  other  network  modules. 

We  believe  that  this  type  of  computing  architecture  and  its  learning  algorithms  for  computation  with 
oscillatory  spatial  modes  is  ideal  for  implementation  in  optical  systems,  where  electromagnetic  oscillations, 
very  high  dimensional  modes,  and  high  processing  speeds  are  available.  The  mathematical  expressions  for 
optical  mode  competition  are  identical  to  our  normal  form  equations  for  oscillatory  mode  competition. 

To  advance  intuition  for  theoretical  analysis,  interactive  simulations  of  the  network  applications  have 
been  designed  on  the  SGI  4D35G  Personal  Iris  Graphics  Workstation.  These  allow  real  time  graphic  display 
of  network  dynamics  and  learning  as  parameters  are  varied. 


3  Mathematical  Background 

The  mathematical  foundation  for  the  current  project  is  a  learning  algorithm  for  recurrent  analog  neural 
networks,  the  normal  form  projection  algorithm,  developed  at  Berkeley  on  this  grant,  AFOSR-91-0325. 
It  allows  analytically  guaranteed  associative  memory  storage  of  analog  patterns,  continuous  sequences,  and 
chaotic  attractors  in  the  same  network. 


3.1  The  Projection  Algorithm 

A  key  feature  of  a  net  constructed  by  this  algorithm  is  that  the  underlying  dynamics  is  explicitly  isomorphic 
to  any  of  a  class  of  standard,  well  understood  nonlinear  dynamical  systems  -  a  “normal  form”  [27].  This 
control  over  the  dynamics  permits  the  design  of  important  aspects  of  the  network  dynamics  independent  of 
the  particular  patterns  to  be  stored.  Stability,  basin  geometry,  and  rates  of  convergence  to  attractors  can  be 
programmed  in  the  standard  dynamical  system. 

There  are  N  units  of  capacity  in  an  N  unit  network.  It  costs  one  unit  per  static  attractor,  two  per  Fourier 
component  of  each  periodic  sequence  (oscillating  attractor),  and  three  (or  more)  per  chaotic  attractor.  There 
are  no  spurious  attractors.  For  periodic  sequences  there  is  a  Liapunov  function  which  governs  the  approach 
of  transient  states  to  stored  trajectories  [4] . 

We  make  particular  use  of  the  normal  form  for  the  Hopf  bifurcation  [27]  configured  as  a  simple  recurrent 
competitive  k-winner-take-all  network  with  a  cubic  nonlinearity,  shown  here  in  Cartesian  coordinate  form. 

N  N 

~  ^  ^  ^  ^  (^) 

j=l  j=l 

This  model  dynamical  system  is  expressed  in  diagonalized  “overlap”  or  “memory  coordinates”  (one  memory 
per  k  nodes).  Matrix  J  is  at  the  disposal  of  the  experimenter:  A  diagonal  matrix  with  real  eigenvalues 
determines  static  attractors;  alternatively,  periodic  attractors  are  obtained  if  J  is  the  real  form  of  a  complex 
diagonal  matrix  with  positive  real  parts  a.  This  causes  initial  states  to  move  away  from  the  origin,  until 
the  competitive  (negative)  cubic  terms  dominate  at  some  distance,  causing  the  flow  to  be  inward  from  all 
points  beyond.  The  off-diagonal  cubic  terms  create  competition  between  directions  of  flow  within  a  spherical 
middle  region  and  thus  create  multiple  attractors  and  basins.  The  larger  the  eigenvalues  in  J  and  off-diagonal 
weights  in  A,  the  faster  the  convergence  to  attractors  in  this  region.  For  temporal  patterns,  these  nodes 
come  in  complex  conjugate  pairs  which  supply  Fourier  components  for  trajectories  to  be  learned.  Many 
types  of  dynamics  have  been  implemented  by  specializing  A  and  J,  including  static  attractors,  limit  cycles, 
and  chaotic  attractors.  Chaotic  dynamics  may  be  created  by  specific  programming  of  the  interaction  of  two 
pairs  of  these  nodes. 

The  rule  for  learning  desired  distributed  spatial  or  spatio-temporal  patterns  can  be  shown  to  be  equivalent 
to  the  operation  of  “projecting”  sets  of  these  nodes  into  “network”  coordinates  (the  standard  basis)  using 
the  desired  vectors  as  corresponding  columns  of  a  transformation  matrix  P.  The  differential  equations  of 
the  recurrent  network  itself  may  be  viewed  as  linearly  transformed  or  “projected” ,  leading  to  new  recurrent 
network  equations  with  general  coupling  Tij  for  the  linear  terms,  and  general  higher  order  weights  Tijki  for 
the  cubic  terms  [1],  The  projection  learning  rule  is. 
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T  =  PJP-'^ 


and 


(2) 


Tijkl  —  ^  PimAmnPjnj^nk^nl^^ 

mn  =  l 

and,  in  the  general  case,  the  normal  form  (1)  is  projected  to  become, 

N  N 

Xi  =  -TXi  -i-^TijXj  -  '^TijkiXjXkXi.  (3) 

j=i  jki 

3.1.1  Chaotic  attractors 

We  use  the  well  studied  family  of  Chua  attractors[30,  20],  implemented  by  the  Chua  hardware  circuit[19], 
to  investigate  the  effectiveness  of  complex  dynamics  in  this  associative  memory.  This  is  a  physical  system 
whose  mathematical  model  is  capable  of  duplicating  most  experimentally  observed  chaotic  and  bifurcation 
phenomena,  and  which  has  yielded  to  a  mathematical  treatment. 

The  Canonical  Chua  equations  give  access  to  a  rich  variety  of  dynamics  to  explore  in  the  system.  Recent 
work  in  the  Chua  group  has  revealed  that  more  than  30  chaotic  attractors  including  types  similar  to  those 
that  have  been  studied  in  the  literature,  which  we  have  previously  employed,  such  as  the  Lorenz  and  Roessler 
attractors,  can  be  obtained  from  the  cannonical  equations  for  the  Chua  circuit  by  variation  of  7  parameters 
[18], 

We  have  demonstrated  in  recent  analysis  and  simulations  that  sets  of  canonical  equations  for  the  Chua 
circuit  in  three  dimensional  subspace  blocks  can  be  used  in  the  projection  network.  Multiple  Chua  attractors 
have  been  created  simply  by  adding  off-diagonal  normal  form  competitive  terms  to  couple  sets  of  the  three 
Chua  equations, 

Case  I:  RC2  >  0 


N 

Vi  =  a{vi  -  V2  -  fivi))  -  viY^AijV^ 

i=i 

N 

h  =  V\  -  V2  A  Vz  -  Mjv] 

j=i 

N 

V3  =  -/9v2  -  7V3  -V3JJ  AzjVj 

i=i 


Case  II:  RC2  <  0 


V2 


h 


N 

a{-vi  +V2A  f(vi))  -  viY^Aijv] 

j=i 

N 

-Vi  +  V2  -  V3  -  ^2  ^  AzjVj 

i=i 

N 

I3v2  +  7V3  Azjv] 

i=i 


(4) 

(5) 

(6) 

(7) 

(8) 
(9) 

(10) 

(11) 


f{vi)  =  bvi  +  l/2(a  -  b)(\vi  +  1 1  -  |t;i  -  1 1). 
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Figure  1:  Projection  Network  with  weights.  The  A  matrix  programs  a  k-winner-take-all  net  which 
determines  attractors,  basins  of  attraction,  and  rates  of  convergence.  The  columns  of  P  contain  the  ouptut 
patterns  associated  to  these  attractors.  The  rows  of  determine  category  centroids 


The  A  matrix  here  has  constant  off  diagonal  coupling,  and  three  dimensional  blocks  of  zeros  along  the 
diagonal  to  remove  the  self-damping  terms  from  the  subspaces  containing  the  Chua  equations. 

If  we  use  the  cubic  nonlinearity 

f(vi)  =  aoVi  +aivf, 


then  these  equations  are  of  the  same  general  form  as  the  Hopf  normal  form  (with  a  different  linear  part), 
and  they  may  be  projected  into  network  coordinates  by  the  learning  rule  (2)  exactly  as  shown  above  to  give 
network  equations  (3)  The  matrix  J  now  contains  the  linear  coupling  terms  of  the  Chua  equations  in  three 
dimensional  blocks  along  the  diagonal,  and  zeros  everywhere  else.  The  diagonal  blocks  in  J  have  the 
following  form, 


(— a  +  ao)  a  0 

1  -1  1 

0  -T 


Here  the  matrix  is  as  described  above,  except  for  the  the  diagonal  slot  An  =  qoi  from  the  cubic  nonlinear 
term  aaivf  of  the  Chua  equations. 


3.2  Projection  Network 

An  alternative  network  for  for  implementation  of  the  projection  algorithm  is  the  “projection  network”  [7].  In 
the  projection  net,  the  algebraic  projection  operation  into  and  out  of  memory  coordinates  is  done  explicitly 
by  a  set  of  weights  in  two  feedforward  linear  networks  characterized  by  weight  matrices  and  P.  These 
map  inputs  into  and  out  of  the  nodes  of  the  recurrent  dynamical  network  in  memory  coordinates  sandwiched 
between  them.  This  kind  of  network,  with  explicit  input  and  output  projection  maps  that  are  inverses,  may 
be  considered  an  “unfolded”  version  of  the  purely  recurrent  networks  described  above. 

The  autoassociative  case  of  this  network  is  formally  equivalent  to  the  higher  order  network  realization 
used  above  as  a  biological  model  [3].  All  the  mathematical  results  proved  for  the  projection  algorithm  in 
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this  case  carry  over  to  this  new  architecture,  but  more  general  versions  can  be  trained  and  applied  in  novel 
ways.  The  new  network  has  weights  instead  of  4*  and  is  more  useful  for  engineering  applications 
and  for  simulations  of  the  biological  model.  The  input  and  output  weights  could  be  stored  oflF-chip  in  a 
conventional  memory,  and  the  fixed  weights  of  the  dynamic  normal  form  network  could  be  implemented  in 
analog  VLSI  for  fast  analog  relaxation. 

This  network  is  shown  in  figure  1.  Input  pattern  vectors  X'  are  applied  as  pulses  which  project  onto  each 
vector  of  weights  (row  of  the  matrix)  on  the  input  to  each  unit  i  of  the  dynamic  network  to  establish  an 

activation  level  t;,-  which  determines  the  initial  condition  for  the  relaxation  dynamics  of  this  network.  The 
recurrent  weight  matrix  A  of  the  dynamic  network  can  be  chosen  so  that  the  unit  or  predefined  subspace 
of  units  which  recieves  the  largest  projection  of  the  input  will  converge  to  some  state  of  activity,  static  or 
dynamic,  while  all  other  units  are  supressed  to  zero  activity. 

Given  prototype  patterns  to  be  stored,  a  matrix  inversion  determines  network  weights.  For  nearly  or¬ 
thogonal  patterns,  matrix  transposition  is  used  instead.  Unsupervised  or  supervised  incremental  learning 
algorithms  for  pattern  classification,  such  as  competitive  learning  or  bootstrap  Widrow-Hoff  can  easily  be 
implemented.  We  have  well  behaved  simulations  containing  multiple  static,  oscillatory,  and  chaotic  attractors 
in  different  competing  subspaces  of  the  same  network. 

3.2.1  Handwritten  Character  Recognition 

Using  the  projection  architecture,  an  effective  real  time  handwritten  character  recognition  system  with  mouse 
input  of  characters  and  on  line  learning  has  been  developed.  Various  options  allow  the  system  to  utilize  static, 
oscillatory,  and/or  chaotic  attractors  (Lorenz,  Rossler,  Ruelle-Takens,  Chua,  etc).  This  is  the  first  system 
we  know  of  that  can  accomplish  reliable  pattern  recognition  with  exclusively  chaotic  dynamics  [7]. 

Handwritten  characters  have  a  natural  scale  and  translation  invariant  analog  representation  in  terms  of 
a  sequence  of  angles  that  parametrize  the  pencil  trajectory.  We  remove  the  writing  velocity  variation  from 
the  raw  set  of  (x,y)  samples  of  a  digit  trajectory  by  interpolating  a  new  set  of  N  (x,y)  points  equally  spaced 
along  the  curve.  A  vector  of  the  sines  and  cosines  of  the  angles  from  the  horizontal  made  by  the  line  segment 
emanating  from  each  point  then  becomes  the  preprocessed  representation  of  the  digit  to  be  learned  or  input 
to  the  network  for  recognition. 

Learning  in  this  system  can  be  as  fast  as  recognition.  We  have  seen  that  the  projection  algorithm  supplies 
a  formula  that  allows  one  shot  ‘‘learning”  of  prototypes  and  immediately  establishes  a  basin  of  attraction 
that  determines  the  generalization  response  of  the  network  to  future  inputs.  Using  a  projection  network 
architecture  with  32  attractors  for  digits  and  lower  case  letters,  where  the  input  vector  is  of  dimension  N  = 
64,  with  only  this  one  shot  learning  of  prototypes,  and  very  small  databases  for  a  single  writer  at  a  time,  all 
attractors  allow  roughly  95%  correct  recognition  responses.  Unexpected  properties  have  been  found  in  the 
systems  utilizing  chaotic  attractors.  Chaotic  attractors,  for  example  have  different  basins  of  attraction  from 
static  or  periodic  attractors. 

In  the  projection  network,  or  its  folded  biological  version,  the  chaotic  attractors  have  a  basin  of  attraction 
in  the  N  dimensional  state  space  that  constitues  a  category,  just  like  any  other  attractor  in  the  system.  There 
may  be  computational  advantages  to  the  basins  of  attraction  (categories)  produced  by  chauDtic  attractors,  or 
to  the  effects  their  outputs  have  as  inputs  to  other  network  modules. 

In  the  projected  or  folded  network  coordinates,  the  particular  distrubuted  N  dimensional  spatio-temporal 
patterns  learned  for  the  four  components  of  the  chaotically  paired  oscillatory  modes,  or  the  three  components 
of  the  Lorenz  system,  form  a  coordinate-specific  “encoding”  of  the  chaotic  attractor,  which  may  constitute  a 
recognizable  input  to  another  network,  if  it  falls  within  some  learned  basin  of  attraction.  While  the  details  of 
the  trajectory  of  a  chaotic  attractor  in  any  physical  continuous  dynamical  system  are  lost  in  the  noise,  there 
is  still  a  particular  structure  to  the  attractor  which  is  a  recognizable  “signature”.  This  allows  communication 
and  “recognition”  of  chaotic  attractors. 

We  have  found  attractors  in  the  Chua  family  which  out-perform  the  Lorenz  attractor  in  our  handwritten 
character  recognition  systems,  with  fast  competitive  suppression  at  low  values  of  competitive  coupling. 
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Figure  2:  Biological  subnetwork  of  excitatory  cell  populations  x,-,  inhibitory  cell  populations  y,*,  inputs  6,-, 
adaptive  excitatory  to  excitatory  connections  Wij ,  and  constant  local  inhibitory  feedback  connections  y  and 


-h. 


3.3  Biological  Associative  Memory  Module 


We  have  determined  a  biologically  “minimal”  model  that  is  intended  to  assume  the  least  anatomically 
justified  coupling  sufficient  to  allow  function  as  an  oscillatory  associative  memory.  The  network  shown  in 
figure  2  is  meant  only  as  a  cartoon  of  the  real  biology,  which  is  designed  to  reveal  the  general  mathematical 
principles  and  mechanisms  by  which  the  actual  system  might  function. 

Long  range  excitatory  to  excitatory  connections  are  well  known  as  “association al”  connections  in  olfactory 
cortex[28]  and  cortico-cortico  connections  in  neocortex.  Since  our  units  are  neural  populations,  we  can  expect 
that  some  density  of  full  cross-coupling  exists  in  the  system  [28],  and  our  weights  are  taken  to  be  the  average 
synaptic  strengths  of  these  connections.  Local  inhibitory  “interneuron.s”  are  a  ubiquitous  feature  of  the 
anatomy  of  cortex  [26,  23].  It  is  unlikely  that  they  make  long  range  connections  (>  1  mm)  by  themselves. 
These  connections,  and  even  the  debated  interconnections  between  them,  are  therefore  left  out  of  a  minimal 
coupling  model.  The  resulting  network  is  a  fair  cartoon  of  the  well  studied  circuitry  of  olfactory  (pyriform) 
cortex.  Since  almost  all  of  cortex  has  this  type  of  structure  in  the  brains  of  amphibia  and  reptiles,  our 
super-network  of  these  submodules  has  the  potential  to  become  a  reasonable  caricature  of  the  full  cortical 
architecture  in  these  animals.  Although  the  neocortex  of  mammals  is  more  complicated,  we  expect  the  model 
to  provide  useful  suggestions  about  the  principles  of  oscillatory  computation  there  as  well. 

For  an  N  dimensional  system,  this  minimal  coupling  structure  is  described  mathematically  by  the  matrix 

T. 

\  W  ^hl 
gl  0 


(12) 


W  is  the  N/2  x  N/2  matrix  of  excitatory  interconnections,  and  gl  and  hi  are  N/2  x  N/2  identity  matrices 
multiplied  by  the  positive  scalars  g,  and  h.  These  give  the  strength  of  coupling  around  local  inhibitory  feed¬ 
back  loops.  A  state  vector  is  composed  of  local  average  cell  voltages  for  N/2  excitatory  neuron  populations 
X  and  NI2  inhibitory  neuron  populations  y.  Intuitively,  since  the  inhibitory  units  receive  no  direct  input 
and  give  no  direct  output,  they  act  as  hidden  units  that  create  oscillation  for  the  amplitude  patterns  stored 
in  the  excitatory  cross-connections  W .  This  may  perhaps  be  viewed  as  a  specific  structural  addition  to  a 
recurrent  analog  higher  order  network  architecture  to  convert  its  static  attractors  to  periodic  attractors. 
Here  the  symmetric  sigmoid  functions  of  such  a  network  are  Taylor  expanded  up  to  cubic  terms  with  third 
order  weights  (quadratic  terms  are  killed  by  the  symmetry).  Network  equations  with  the  first  order  coupling 
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(12)  shown  above,  plus  these  higher  order  excitatory  synapses,  are  shown  below,  in  component  form. 

N/2  Nf2 

ii  =  -TXi  -  hpi  ^  WijkixjXkXi  -hbi,  (13) 

j:=l  jkl=zl 

Pi  =  -ryi^-gxi,  (14) 

The  competitive  (negative)  cubic  terms  of  constitute  a  directly  programmable  nonlinearity  that  is  indepen¬ 
dent  of  the  linear  terms.  Normal  form  theory  shows  that  these  cubics  are  the  essential  nonlinear  terms 
required  to  store  oscillations,  because  of  the  (odd)  phase  shift  symmetry  required  in  the  vector  field.  They 
serve  to  create  multiple  periodic  attractors  by  causing  the  oscillatory  modes  of  the  linear  term  to  compete, 
much  as  the  sigmoidal  nonlinearity  does  for  static  modes  in  a  network  with  static  attractors  [1,4].  Intuitively, 
these  terms  may  be  thought  of  as  sculpting  the  maxima  of  a  “saturation”  (energy)  landscape,  into  which 
the  modes  with  positive  eigenvalues  expand,  and  positioning  them  to  lie  in  the  directions  specified  by  the 
eigenvectors  to  make  them  stable.  A  Liapunov  function  for  this  landscape  may  be  explicitly  constructed 
in  a  special  polar  coordinate  system  [1,  4].  We  use  this  network  directly  as  our  biological  model.  From  a 
physiological  point  of  view,  (13)  may  be  considered  a  model  of  a  biological  network  which  is  operating  in  the 
linear  region  of  the  known  axonal  sigmoid  nonlinearities,  and  contains  instead  these  higher  order  synaptic 
nonlinearities. 

Adding  the  higher  order  weights  corresponds,  in  connectionist  language,  to  increasing  the  complexity 
of  the  neural  population  nodes  to  become  “higher  order”  or  “sigma-pi”  units.  Clusters  of  synapses  within 
a  population  unit  can  locally  compute  products  of  the  activities  on  incomming  primary  connections  Wjj, 
during  higher  order  Hebbian  learning,  to  establish  a  weight  Wijki  (see  figure  3).  These  secondary  higher 
order  synapses  are  then  used  in  addition  to  the  synapses  Wij,  during  operation  of  the  overall  network,  to 
weight  the  effect  of  triple  products  of  inputs  in  the  output  summation  of  the  population. 

Using  only  the  long  range  excitatory  connections  Wij  available,  some  number  of  the  higher  order  synaptic 
weights  Wijki  could  be  realized  locally  within  a  neural  population  in  the  axo-dendritic  interconnection  plexus 
known  as  “neuropil”  [3].  Only  (N/2)^  of  these  (A/2)^  possible  higher  order  weights  are  required  in  principle 
to  approximate  the  performance  of  the  projection  algorithm  [6].  The  size  of  our  cortical  patches  is  limited 
by  this  number,  and  is  itself  motivation  for  modularity. 


3.3.1  Hebbian  Learning 


The  minimal  network  coupling  (12)  for  T  results  from  the  projection  learning  rule  (2)  when  a  specific 
biological  form  is  chosen,  in  the  columns  s  of  P,  for  the  patterns  to  be  stored.  Only  the  higher  order  weights 
Wijki  between  excitatory  populations  shown  in  the  biological  module  (13)  are  required  for  approximate 
pattern  storage  [6].  This  special  complex  form  for  and  the  corresponding  asymptotic  solutions  X^{t) 
established  are, 


p5 


=>  X‘{t) 


(15) 


The  phase  6xi^y  is  constant  over  the  components  of  each  kind  of  neural  population  x  and  y,  and  differs 
only  between  them.  This  is  basically  what  is  observed  in  the  olfactory  bulb  (primary  olfactory  cortex)  and 
prepyriform  cortex[3].  The  phase  of  inhibitory  components  Oy  in  the  bulb  lags  the  phase  of  the  excitatory 
components  9^  by  approximately  90  degrees. 

A  Hebbian  learning  rule  may  be  derived  from  the  projection  learning  rule  which  allows  a  network  to 
learn  its  attractor  categories  by  local  self  organization  of  synapses  and  synaptic  clusters  according  to  pre 
and  post  synaptic  activities  experienced  during  external  input  forcing.  For  orthonormal  static  patterns  P, 
P~^  =  P^ ,  and  the  projection  rule  for  the  W  matrix  reduces  to  an  outer  product,  or  “Hebb”  rule,  and  the 
projection  for  the  higher  order  weights  becomes  a  multiple  outer  product  rule: 


N/2  N/2 

Wij  x\x^j  ,  Wjjki  -  cSjjbki  -  XjX^^xlx^j 


(16) 


s  =  l 
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neuron  population  i 


Figure  3:  Neural  population  subnetwork  acting  as  a  sigma-pi  unit.  It  uses  secondary  higher  order  synaptic 
weights  Wijki  on  products  of  the  activities  of  incoming  primary  connections  Wij,  and  receives  a  global 
inhibitory  bias. 


When  the  Hebbian  learning  rule  (16)  is  used,  the  higher  order  weights  Wijki  of  the  network  model  (13) 
can  be  decomposed  so  that  (13)  becomes, 

N/2  N/7  N/2 

Xi  =  -TXi  -  WijXj  d-  d  ^  WjkiXjXkXi  -  cxi  ^  xj  d*  6*  (17) 

where  Wijki  comes  from  the  multiple  outer  product  in  (16),  and  —cx,'  comes  from  the  cSijSjk  term 

in  (16).  This  single  weight  negative  term  corresponds  to  a  shunting  inhibitory  bias  which  depends  on  the 
global  excitatory  activity  of  the  network.  “Shunting”  here  means  multiplied  by  the  current  average  cell 
voltage  Xi  of  population  i.  This  is  an  input  which  is  identical  for  all  excitatory  neural  populations  and  could 
be  calculated  by  a  single  node  of  the  network  which  receives  input  from  all  excitatory  populations,  as  shown 
in  figure  3.  Such  a  node  might  correspond  to  one  of  the  nuclei  which  lie  below  the  prepyriform  cortex.  These 
send  and  receive  the  diffuse  projections  required  to  and  from  prepyriform  cortex.  The  constants  c  and  d  of  eq. 
(17)  give  the  magnitude  of  the  inhibitory  bias  c  and  the  average  higher  order  weight  d.  These  constants  are 
derived  from  entries  in  the  normal  form  matrix  and,  as  we  will  discuss  below,  c  >  d  guarantees  stability 
of  all  stored  patterns.  The  greater  the  bias  c  relative  to  d,  the  greater  the  “competition”  between  stored 
patterns,  the  more  robust  is  the  stability  of  the  present  attractor,  and  vice  versa.  This  is  the  mechanism 
employed  later  in  the  biological  sensory-motor  architecture  for  central  control  of  attractor  transitions  within 
modules. 

4  Normal  Form  Associative  Memory  Modules 

As  described  above,  the  network  modules  of  the  cortical  model  were  developed  previously  as  models  of 
olfactory  cortex,  or  caricatures  of  “patches” of  neocortex  [2,  5,  3].  A  particular  module  is  formed  by  a  set 
of  neural  populations  whose  interconnections  also  contain  higher  order  synapses.  These  synapses  determine 
attractors  for  that  subnetwork  independent  of  other  subnetworks.  Each  module  assumes  only  minimal 
internal  coupling  of  excitatory  and  inhibitory  neural  populations  justified  by  known  anatomy  in  prepyriform 
cortex,  as  described  earlier. 
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Figure  4:  Energy  landscape  of  amplitudes  of  binary  oscillatory  unit  with  no  external  input.  For  low  levels 
of  competition,  there  is  a  broad  circular  valley.  With  high  levels  of  competition,  there  is  a  deep  potential 
wells  on  each  axis.  Phetse-locked  external  inputs  simply  add  a  linear  tilt  to  the  landscape  which  will  shift 
a  single  attractor  accross  the  circular  valley  at  low  competition,  but  can’t  move  it  from  a  potential  well  at 
high  competition. 


In  this  biological  model,  the  attractors  within  modules  are  distributed  patterns  of  activity  like  those 
observed  experimentally  [24].  However,  the  network  is  equivalent  to  the  architecture  of  modules  in  “normal 
form”  as  described  above,  and  may  easily  be  designed,  simulated,  and  theoretically  evaluated  in  these 
coordinates. 

By  analyzing  the  network  in  the  polar  version  of  the  normal  form  coordinates,  the  amplitude  and  phase 
dynamics  have  a  particularly  simple  interaction.  When  the  input  to  a  module  is  synchronized  with  its 
intrinsic  oscillation,  the  amplitudes  of  the  periodic  activity  may  be  considered  separately  from  the  phase 
rotation,  and  the  network  of  the  module  may  be  viewed  as  a  static  network  with  these  amplitudes  as  its 
activity.  We  have  further  shown  analytically  that  the  network  modules  we  have  constructed  have  a  strong 
tendency  to  synchronize  as  required.  An  attractor  in  these  winner-take-all  normal  form  cordinates  is  one 
oscillator  at  its  maximum  amplitude,  with  the  others  near  zero  amplitude. 

For  the  oscillating  networks  of  the  present  model,  we  use  the  normal  form  for  the  Hopf  bifurcation,  which 
characterizes  the  genesis  of  oscillation,  and  make  particular  use  of  it  to  control  bifurcations.  A  bifurcation 
is  a  discontinuous  (topologically  singular)  change  in  the  phase  portrait  of  possibilities  for  the  continuous 
dynamical  behavior  of  a  system  (such  as  the  appearance  or  dissappearance  of  an  attractor)  that  occurs  as  a 
bifurcation  parameter  reaches  a  critical  value.  It  is  the  bifurcation  in  the  vector  field  of  a  network  module 
from  one  to  many  attractors  that  effects  the  essential  digitization  of  the  system  in  time  and  state,  as  we  will 
see  below. 

4.1  A  Binary  Oscillatory  Unit 

To  illustrate  the  behavior  of  individual  network  modules,  we  examine  a  binary  (two  attractor)  module;  the 
behavior  of  modules  with  more  than  two  attractors  is  similar.  Such  a  unit  is  defined  in  polar  normal  form 
coordinates  by  the  following  equations  of  the  Hopf  normal  form: 

n.-  =  u.ni  -  cr^i  +(d-  bsin(uiciockt))rurli  +  ^  wfjlj  cos(0j  -  ^i,) 

j 

roi  =  UiVoi  -  cr^,-  +  (d  -  6sjn(wc/oci<))»’0i»’i.  +  ^  wTjlj  cos(0j  -  do,) 

i 

hi  =  Wi +  ^u;+(/,/rii)sin(d,- -  dii) 

3 

hi  =  w,-  +  ^  to,:)  (/,•  /roi )  sin  (dj  -  doi ) 

3 
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The  clocked  parameter  bsin{uJciockt)  is  used  to  control  attractor  transitions  in  the  Elman  architecture  to 
be  discussed  later.  It  has  lower  frequency  (1/10)  than  the  intrinsic  frequency  of  the  unit  w,*. 

When  the  oscillators  are  sychronized  with  the  input,  Bj  —  Bu  =  0,  and  the  phase  terms  cos{Bj  —  = 

cos(O)  =  1  dissappear.  This  leaves  the  amplitude  equations  ru  and  foi  with  static  inputs  ^tj^j 

Examination  of  the  phase  equations  shows  that  a  unit  has  a  strong  tendency  to  synchronize  with  an  input 
of  similar  frequency.  Defining  the  phase  difference,  (j)  z=  Bq  —  Bj  zz  Bq  —  a;/<  between  a  unit  Bq  and  it’s  input 
Bj  we  can  write  a  differential  equation  ^  for  the  phase  difference  4>  , 

^  =  Wo -w/ +  (r//ro)sin(~<?^)  ,  so  ,  [(ro/r/)(u;/ -  wo)] 

There  is  an  attractor  at  zero  phase  difference  <^  =  —  ^/  =  0,  and  a  repellor  at  180  degrees  in  the 

phase  difference  equations  4>  for  either  side  of  a  unit  driven  by  an  input  of  the  same  frequency,  w/  —  wq  =  0. 
In  simulations,  the  interconnected  network  of  these  units  to  be  described  below  synchronizes  robustly  within 
a  few  cycles  following  a  perturbation. 

If  the  frequencies  of  attractors  in  some  modules  of  the  architecture  are  randomly  dispersed  by  a  significant 
amount,  w/ -Wo  ^  0,  pheise-lags  appear  first,  then  synchronization  is  lost  in  those  units.  An  oscillating  module 
therefore  acts  as  a  band  paiss  filter  for  oscillatory  inputs. 

Thus  we  have  network  modules  which  emulate  static  network  units  in  their  amplitude  activity  when  fully 
phase-locked  to  their  input.  Amplitude  information  is  transmitted  between  modules,  with  an  oscillatory 
carrier. 


4.2  Attractor  Transitions  by  Bifurcation 

For  fixed  values  of  the  competition,  in  a  completely  synchronized  system,  the  internal  amplitude  dynamics 
define  a  gradient  dynamical  system  for  a  fourth  order  energy  function.  Figures  4a  and  4b  show  the  energy 
landscape  with  no  external  input  for  high  and  low  levels  of  competition  respectively.  External  inputs  that 
are  phase-locked  to  the  module’s  intrinsic  oscillation  simply  add  a  linear  tilt  to  the  landscape. 

For  low  levels  of  competition,  there  is  a  broad  circular  valley.  When  tilted  by  external  input,  there  is  a 
unique  equilibrium  that  is  determined  by  the  bias  in  tilt  along  one  axis  over  the  other.  Thinking  of  ru  as 
the  “acitivity”  of  the  unit,  this  acitivity  becomes  a  monotonically  increasing  function  of  input.  The  module 
behaves  as  an  analog  connectionist  unit  whose  transfer  function  can  be  approximated  by  a  sigmoid.  We  refer 
to  this  as  the  ‘"analog”  mode  of  operation  of  the  module. 

With  high  levels  of  competition,  the  unit  will  behave  as  a  binary  (bistable)  digital  flip-flop  element.  There 
are  two  deep  potential  wells,  one  on  each  axis.  Hence  the  final  steady  state  of  the  unit  is  determined  by 
which  basin  of  attraction  contains  the  initial  state  of  the  system  in  the  analog  mode  of  operation  before 
competition  is  increased  by  the  clock.  This  state  changes  little  under  the  influence  of  external  input:  a  tilt 
will  move  the  location  of  the  attractor  basins  only  slightly.  Hence  the  module  performs  a  winner-take-all 
choice  on  the  coordinates  of  its  initial  state  and  maintains  that  choice  independent  of  external  input.  This 
is  the  “digital”  or  “quantized”  mode  of  operation  of  a  module.  We  use  this  bifurcation  in  the  behavior  of 
the  modules  to  control  information  flow  within  the  network  to  be  described  below. 

5  Sensory-Motor  Architecture 

As  a  benchmark  for  the  capabilities  of  the  system,  and  to  create  a  point  of  contact  to  standard  network 
architectures,  we  have  shown  how  a  discrete-time  “simple  recurrent”  or  “Elman”  network  architecture  [21]  can 
be  constructed  from  recurrently  connected  oscillatory  associative  memory  modules  described  by  continuous 
nonlinear  ordinary  differential  equations  [14,  11],  The  system  learns  to  function  as  a  finite  state  automaton 
that  recognizes  or  generates  the  infinite  set  of  six  symbol  strings  that  are  defined  by  a  Reber  grammar. 

The  time  steps  (sensory-motor  cycles)  of  the  system  are  implemented  by  rhythmic  variation  (clocking)  of 
the  competition  bifurcation  parameter.  This  holds  input  and  “context”  (sensory)  modules  clamped  at  their 
attractors  while  ‘hidden  and  output  (motor)  modules  change  state,  then  clamps  hidden  and  output  states 
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Figure  5:  Elman  architecture:  The  input  and  output  modules  each  consist  of  a  single  associative  memory 
module  with  six  oscillatory  attractors,  one  for  each  of  the  six  symbols  in  the  grammar.  The  hidden  and 
context  modules  consist  of  binary  “units”  composed  of  two  oscillatory  attractors.  Dotted  lines  enclose  the 
two  “sensory”  and  “motor”  sets  of  modules  w^hich  are  allowed  to  change  attractors  at  alternate  peaks  of  the 
machine  cycle. 


while  context  modules  are  released  to  load  those  states  as  the  new  context  for  the  next  cycle  of  input.  The 
dotted  lines  of  figure  5  show  these  two  sets  of  modules. 

We  use  two  types  of  modules  in  implementing  the  Elman  network  architecture.  The  input  and  output 
layer  each  consist  of  a  single  associative  memory  module  with  six  oscillatory  attractors  (six  competing 
oscillatory  modes),  one  for  each  of  the  six  possible  symbols  in  the  grammar.  The  hidden  and  context  layers 
consist  of  binary  “units”  composed  of  a  two  oscillator  module  with  internal  competition.  We  think  of  one 
mode  within  the  unit  as  representing  “1”  and  the  other  as  representing  “0”  (see  figure  5). 

The  network  approximates  a  static  network  in  its  amplitude  activity  when  fully  phase- locked.  Amplitude 
information  is  transmitted  between  modules,  with  an  oscillatory  carrier.  If  the  frequencies  of  attractors  in 
the  architecture  are  randomly  dispersed  by  a  significant  amount  phaise-lags  appear,  then  synchronization  is 
lost  and  improper  transitions  begin  to  occur. 

It  is  the  bifurcation  in  the  phase  portrait  of  a  module  from  one  to  many  attractors  that  contributes 
the  essential  digitalization  of  the  system  in  time  and  state.  The  analog  mode  for  a  module  allows  input  to 
prepare  its  initial  state  for  the  binary  decision  between  attractor  bcisins  that  occurs  as  competition  rises  and 
the  double  potential  well  appears. 

The  feedback  between  sensory  and  motor  modules  is  effectively  cut  when  one  set  is  clamped  at  high 
competition.  The  system  thus  operates  in  discrete  time  by  alternating  sets  of  transitions  between  finite  sets 
of  attracting  states.  This  kind  of  alternate  clocking  and  buffering  (clamping)  of  some  states  while  other 
states  relax  is  essential  to  the  reliable  operation  of  digital  architectures,  as  it  is  in  our  modules.  The  clock 
input  on  a  flip-flop  clamps  it’s  state  until  its  signal  inputs  have  settled  and  the  choice  of  transition  is  made 
with  the  proper  information  available.  In  our  simulations,  if  we  clock  all  modules  to  transition  at  once,  the 
programmed  transitions  lose  stability,  and  we  get  transitions  to  un programmed  fixed  points  and  simple  limit 
cycles  for  the  whole  system.  This  is  a  strong  justification  for  the  use  of  clamped  attractors  and  clocked 
cycles. 


5.1  Training 

During  training,  the  hidden  module  units  are  left  at  zero  or  negative  competition  after  clamping,  of  input 
and  context.  They  are  thus  free  to  take  analog  values  on  a  given  time  step  so  that  a  real  valued  error  can 
be  defined  and  backpropagation  may  be  used  to  train  the  system. 

If  the  context  units  are  clamped  with  high  competition,  they  are  essentially  “quantized”  to  take  on 
only  their  0  or  1  attractor  values,  and  the  feedback  connections  from  the  hidden  units  cannot  affect  them. 
While  Giles,  et.  al.  generally  do  not  quantize  their  units  until  the  end  of  training  to  extract  a  finite  state 
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automaton,  they  and  others  [31]  find  that  quantizing  of  the  context  units  during  training  like  this  increases 
learning  time  but  produces  a  network  with  perfect  performance. 

We  also  have  the  option  of  leaving  the  competition  within  the  context  units  at  intermediate  levels  to  allow 
them  to  take  on  analog  values  in  a  variable  sized  neighborhood  of  the  0  or  1  attractors.  Since  our  system 
is  recurrently  connected  by  an  identity  map  from  hidden  to  context  units,  it  will  relax  to  some  equilibrium 
determined  by  the  impact  of  the  context  units  and  the  clamped  input  on  the  hidden  unit  states,  and  the 
effect  of  the  feedback  from  those  hidden  states  on  the  context  states.  We  can  thus  further  explore  the  impact 
on  learning  of  this  range  of  operation  between  discrete  time  and  space  automaton  and  continuous  analog 
recurrent  network. 

The  ability  to  operate  as  an  finite  automaton  with  oscillatory /chaotic  “states”  is  thus  an  important 
benchmark  for  this  architecture,  but  only  a  subset  of  its  capabilities.  At  low  to  zero  competition,  the 
supra-system  reverts  to  one  large  continuous  dynamical  system.  We  expect  that  this  kind  of  variation  of 
the  operational  regime,  especially  with  chaotic  attractors  inside  the  modules,  though  unreliable  for  habitual 
behaviors,  may  nontheless  be  very  useful  in  other  areas  such  as  the  search  process  of  reinforcement  learning. 

5*2  Synchronization,  Noise,  and  Intermodule  Communication 

An  important  element  of  intra-cortical  communication  in  the  brain,  and  between  modules  in  this  architecture, 
is  the  ability  of  a  module  to  detect  and  respond  to  the  proper  input  signal  from  a  particular  module,  when 
inputs  from  other  modules  which  is  irrelevant  to  the  present  computation  are  contributing  cross-talk  and 
noise.  This  is  smilar  to  the  problem  of  coding  messages  in  a  computer  architecture  like  the  Connection 
Machine  so  that  they  can  be  picked  up  from  the  common  communication  buss  line  by  the  proper  receiving 
module.  We  are  investigating  the  hypothesis  that  sychronization  control  is  one  way  the  brain  can  solve  this 
coding  problem. 

Because  communication  between  modules  in  the  architecture  is  by  continuous  time- varying  analog  vectors, 
the  process  is  more  one  of  signal  detection  and  pattern  recognition  by  the  modules  of  their  inputs  than  it 
is  “message  passing” .  This  is  why  the  demonstrated  performance  of  the  modules  in  handwritten  character 
recognition  is  significant,  and  why  we  expect  there  are  important  possibilities  in  the  architecture  for  the 
kinds  of  chaotic  signal  processing  studied  by  Chua  [22]. 

We  have  shown  that  the  dynamic  attractors  -  oscillatory  or  chaotic  -  within  the  modules  of  this  ar¬ 
chitecture  must  synchronize  to  effectively  communicate  information  and  produce  reliable  transitions  [10]. 
In  simulations,  we  have  synchronized  lorenz  and  Chua  attractors  for  operation  in  the  architecture  using 
techniques  of  coupling  developed  by  Chua  [22]  for  secure  “broadspectrum”  communication  by  a  modulated 
chaotic  carrier  wave  [11,  13]. 

We  have  also  shown  in  these  modules  a  superior  stability  of  oscillatory  attractors  over  static  attractors 
in  the  presence  of  noise  perturbations  with  the  1/f  spectral  character  of  the  noise  found  experimentally  by 
Freeman  in  the  brain  [10].  This  may  be  one  reason  why  the  brain  uses  dynamic  attractors.  An  oscillatory 
attractor  acts  like  a  a  bandpass  filter  and  is  effectively  immune  to  the  many  slower  macroscopic  bias  pertur¬ 
bations  in  the  theta-alpha-beta  range  (3  -  25  Hz)  below  its  40  -80  Hz  passband,  and  the  more  microscopic 
perturbations  of  single  neuron  spikes  in  the  100  -  1000  Hz  range.  In  an  environment  with  this  spectrum  of 
perturbation,  modules  with  static  attractors  cannot  operate  reliably. 

5.3  Attentional  Control  of  Synchrony 

The  network  architecture,  shown  in  figure  6,  has  been  designed  so  that  amplitude  codes  the  information 
content  or  activity  of  a  module,  whereas  phase  and  frequency  are  used  to  “softwire”  the  network.  An 
oscillatory  network  module  has  a  passband  outside  of  which  it  will  not  synchronize  with  an  oscillatory 
input.  Modules  can  therefore  easily  be  desynchronized  by  perturbing  their  resonant  frequencies.  They  can 
also  be  desynchronized  by  anti-phase  inputs  as  in  the  models  of  Koenig,  et.  al.  [?]  Furthermore,  only 
synchronized  modules  communicate  by  exchanging  amplitude  information;  the  activity  of  non-resonating 
modules  contributes  incoherant  crosstalk  or  noise. 

The  flow  of  communication  between  modules  can  thus  be  controled  by  controlling  synchrony.  By  changing 
the  intrinsic  frequency  of  modules  in  a  patterned  way,  the  effective  connectivity  of  the  network  is  changed. 
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Figure  6:  Synchronization  control  architecture:  The  input  and  output  modules  show  the  symbol  ‘T’  as 
a  distributed  attractor  pattern.  The  binary  modules  of  the  hidden  and  context  layers  show  oscillatory 
attractors  in  winner-take-all  normal  form  cordinates  where  one  oscillator  at  its  maximum  amplitude,  with 
the  others  near  zero  amplitude.  Activity  levels  oscillate  up  and  down  through  the  plane  of  the  paper.  Dotted 
lines  show  control  outputs  from  the  synchronization  control  modules.  Control  unit  two  is  at  the  one  attractor 
(right  side  of  the  square  active)  and  the  hidden  units  coding  for  states  of  subgraph  two  are  in  synchrony 
with  the  input  and  output  modules.  Here  in  midcycle,  all  modules  are  clamped  at  their  attractors. 


The  same  hardware  and  connection  matrix  can  thus  subserve  many  different  computations  and  patterns  of 
interaction  between  modules  without  crosstalk  problems. 

The  crosstalk  noise  is  actually  essential  to  the  function  of  the  system.  It  serves  as  the  noise  source 
for  making  random  choices  of  output  symbols  and  automaton  state  transitions  in  this  architecture  during 
reinforcement  learning  and  normal  operation  after  learning.  In  cortex  there  is  an  issue  as  to  what  may 
constitute  a  source  of  randomness  of  sufficient  magnitude  to  perturb  the  behavior  of  the  large  ensemble  of 
neurons  involved  in  neural  activity  at  the  cortical  network  level.  It  does  not  seem  likely  that  the  well  known 
molecular  level  of  fluctuations  which  is  easily  averaged  within  a  single  neuron  or  small  group  of  neurons 
can  do  the  job.  The  architecture  here  models  the  hypothesis  that  deterministic  chaos  in  the  macroscopic 
dynamics  of  a  network  of  neurons,  which  is  the  same  order  of  magnitude  as  the  coherant  activity,  can  serve 
this  purpose. 

In  a  set  of  modules  which  is  desynchronized  by  perturbing  the  resonant  frequencies  of  the  group,  coherance 
is  lost  and  “random”  phase  relations  result.  The  character  of  the  model  time  traces  is  now  irregular  as  seen 
in  real  neural  ensemble  activity.  The  behavior  of  the  time  traces  in  different  modules  of  the  architecture 
is  similar  to  the  temporary  appearance  and  switching  of  synchronization  between  cortical  areas  as  seen  in 
observations  of  cortical  processing  during  sensory /motor  tasks  in  monkeys  and  humans  [17].  The  detailed 
structure  of  this  apparently  chaotic  signal  and  its  further  use  in  network  learning  and  operation  are  currently 
under  investigation. 

5.4  Grammatical  Inference 

We  studied  the  use  of  these  capabilities  in  the  grammatical  inference  problem  by  constructing  and  learning 
the  larger  fifteen  hidden  unit  (module)  automata  studied  by  Cleermans,  et  al.  This  consists  of  two  subgraphs 
each  of  which  was  the  automaton  learned  previously  in  work  described  above,  and  shown  in  figure  7.  Strings 
of  this  grammar  can  contain  long  embedded  sequences  of  the  smaller  grammar  before  the  final  transition 
distinguishing  which  branch  you  are  on  appears.  These  transitions  of  this  grammar  were  challenging  to  learn 
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Figure  7:  Graph  diagram  of  the  automaton  emulated  by  the  network  to  generate  the  symbol  strings  of  a 
grammar.  It  is  composed  of  two  subgraphs  joined  by  a  start/end  state.  At  each  node  (network  state),  one 
of  two  symbols  (output  module  attractors)  is  chosen  at  random  (by  crosstalk  noise)  and  fedback  as  input  to 
the  network  to  direct  the  next  transition  of  state  as  shown  by  the  arrows  of  the  diagram. 


because  of  the  embedding.  Cleermans  ei  a/ had  to  alter  the  transition  probabilities  within  the  two  subgraphs 
so  that  the  backpropagation  algorithm  could  distinguish  the  branches  during  learning. 

We  solved  this  learning  problem  by  introducing  a  control  of  program  flow  by  selective  synchronization 
[15,  16].  The  controler  itself  is  modeled  in  this  architecture  as  a  special  set  of  hidden  modules  with  ouputs 
that  affect  the  resonant  frequencies  of  the  other  hidden  modules  or  supply  an  anti-phase  input,  as  shown  in 
figure  6. 

These  enforce  a  segregation  of  the  hidden  module  code  for  the  subgraph  states  during  training  so  that 
different  sets  of  synchronized  modules  learn  to  code  for  each  subgraph  with  the  other  modules  desynchronized 
by  frequency  perturbation.  The  entire  automaton  is  learned  with  its  additional  entry  and  exit  hidden  module 
states  and  with  these  special  hidden  modules. 

Transitions  in  the  system  from  states  in  one  subautomaton  to  the  other  are  made  by  “attending”  to 
the  corresponding  set  of  nodes  in  the  hidden  and  context  layers.  This  switching  of  the  focus  of  attention  is 
accomplished  by  changing  the  patterns  of  synchronization  within  the  network. 

Varying  levels  of  intramodule  competition  control  the  large  scale  direciion  of  information  flow  between 
layers  of  the  architecture.  To  direct  information  flow  on  a  finer  scale,  the  “attention”  mechanism  selects  a 
subset  of  modules  within  each  layer  whose  output  is  effective  in  driving  the  behavior  of  the  system. 

The  system  in  operation  is  made  to  jump  from  states  in  one  subautomaton  to  the  other  by  desynchronizing 
the  proper  subset  of  hidden  modules.  The  possibilities  for  transition  of  the  system  are  thus  be  controled  by 
selective  synchronization.  This  control  itself  is  learned  by  the  special  hidden  units  whose  output  controls 
the  synchrony  of  these  subsets.  During  training,  the  control  modules  learn  to  respond  to  the  proper  input 
symbol  and  context  to  direct  the  flow  of  computation  to  effect  the  difficult  transitions  between  subgraphs. 
Viewing  the  automaton  above  as  a  behavioral  program,  the  control  of  synchrony  constitutes  a  control  of  the 
program  flow  into  its  subprograms  (the  subgraphs). 

In  future  work  we  will  investigate  the  possibilities  for  self-organization  of  the  patterns  of  synchrony  and 
spatially  segregated  coding  in  the  hidden  layer  during  learning.  We  will  explore  the  use  of  lateral  connections 
between  hidden  units  to  cause  competition  for  synchrony  as  has  been  done  by  Koenig  et  al  to  see  how  local 
spatially  segregated  coding  can  be  self-organized  during  learning.  Lateral  neighborhood  connections  between 
hidden  units  which  effect  synchrony  of  neighboring  units  have  been  sucessfully  implemented  in  simulations, 
and  certain  sections  of  a  large  number  of  hidden  units  will  self-organize  into  synchrony  and  take  part  in  the 
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learning  of  certain  subgraphs  of  the  automaton. 


6  Computing  Resources 

Our  analytic  approach  to  understanding  these  networks  relies  heavily  on  geometric  visualization  of  network 
learning  and  operation  in  prefered  coordinate  systems.  The  computer  graphic  capabilities  of  the  Silicon 
Graphics  Personal  Iris  4D35G  workstation  purchased  by  the  grant  has  been  invaluable  in  enabling  us  to 
design  interactive  simulations  with  graphical  display  of  these  geometric  representations  in  order  to  enhance 
our  intuition  and  generate  new  theoretical  insights. 

We  have  employed  the  workstation  as  a  system  for  simulation  and  graphic  display  of  network  dynamics, 
where  we  can  vary  network  parameters  (most  notably  bifurcation  parameters)  and  alter  network  dynamics 
in  real  time.  With  this  capability,  we  were  able  to  rapidly  explore  regions  of  the  parameter  space,  and  find 
where  to  concentrate  our  numerical  and  analytical  efforts. 

7  Invited  Talks  and  Conferences 

1991 

Plenary  speaker.  Applications  of  Artificial  Intelligence  and  Neural  Networks,  SPIE,  Orlando,  Fla.,  April 

Invited  lecture,  Stanford  University,  Computer  Science  Dept.,  Palo  Alto,  Ca.,  April 

International  Joint  Conference  on  Neural  Networks,  Seattle,  Wa.,  June 

Invited  Speaker,  Workshop  on  Complex  Dynamics  in  Neural  Networks,  Vietri,  Italy,  June 

Analysis  and  Modeling  of  Neural  Systems  2,  U.C.  Berkeley,  Ca.,  July 

Neural  Information  Processing  Systems  -  Natural  and  Synthetic,  Denver,  Colo.,  November 

1992 

Invited  speaker,  2nd  Int.  Conf.  on  Fuzzy  Logic  and  Neural  Networks,  lizuka,  Japan,  July 
Computation  and  Neural  Systems  *92,  San  Francisco,  Ca.  July 

Neural  Information  Processing  Systems  -  Natural  and  Synthetic,  Denver,  Colo.,  November 

1993 

Invited  lecture,  RICOH  Palo  Alto  Research  Center  -  January 

Invited  speaker.  International  Workshop  -  Self-Organization,  Learning  and  Dynamics  in  Neural  Networks 
Toulouse,  France,  March  ’ 

Invited  lecture.  Computer  Science  Dept.,  University  of  New  South  Wales,  Sidney,  Austrailia,  June 
Invited  lecture.  Computer  Science  Dept.,  University  of  Queensland,  Brisbane,  Austrailia,  June 
Invited  speaker.  Midwest  Dynamics  Conference,  University  of  California  at  Berkeley,  Berkeley,  Ca., 

October 

Neural  Information  Processing  Systems  -  Natural  and  Synthetic,  Denver,  Colo.,  November 
*talk  or  poster  given  at  all  conferences  listed 
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